Team Blog: Week Four

Week Four
Author

Kailyn Hogan

Published

June 9, 2023

Week Four for the Housing Team

AI-Driven Housing Evaluation for Rural Community Development

This blog outlines where the Housing Team is at after week four of DSPG. We have made a lot of progress in the past four weeks, and we are excited to share it with you!

Project Overview

This is the project plan we came up with the first week of DSPG. This project is intended to span over three years with DPSG, and different interns will be working on it in the coming years. Thus, the project plan is ambitious for this summer.

Problem Statement

The absence of a comprehensive and unbiased assessment of housing quality in rural communities poses challenges in identifying financing gaps and effectively allocating resources for housing improvement. Consequently, this hinders the overall well-being and health of residents, impacts workforce stability, diminishes rural vitality, and undermines the economic growth of Iowa. Moreover, the subjective nature of evaluating existing housing conditions and the limited availability of resources for thorough investigations further compound the problem. To address these challenges, there is a pressing need for an AI-driven approach that can provide a more accurate and objective evaluation of housing quality, identify financing gaps, and optimize the allocation of local, state, and federal funds to maximize community benefits.

Utilizing web scraping techniques to collect images of houses from various assessor websites, an AI model can be developed to analyze and categorize housing features into good or poor quality. This can enable targeted investment strategies. It allows for the identification of houses in need of improvement and determines the areas where financial resources should be directed. By leveraging AI technology in this manner, the project seeks to streamline the housing evaluation process, eliminate subjective biases, and facilitate informed decision-making for housing investment and development initiatives in rural communities

Goals and Objectives

  1. Image Gathering

    • Zillow

    • Realtors.com

    • County Assessor Pages

      • Vangaurd: Independence

      • Beacon Schneider: Slater, New Hampton, and Grundy Center

  2. Build, Train, and Test AI Models

    1. Vegetation Model

    2. Siding Model

    3. Gutter Model

  3. Create Database of Housing Information

    • Zillow

    • Realtors.com

    • County Assessor Pages

      • Vanguard: Independence

      • Beacon Schneider: Slater, New Hampton, and Grundy Center

Our Progress

We have been making good progress to complete the goals and objectives we outlined above. Since the beginning of the Data Science for the Public Good Program, we have been expanding our knowledge of data science, particularly in areas that relate to this housing project. We have been learning and covering new concepts through Data Camp. We have also watched two webinars on TidyCensus training, as well as started creating AI Models to practice with.

Data Camp Training:

  1. GitHub Concepts

  2. AI Fundamentals

  3. Introduction to R

  4. Intermediate R

  5. Introduction to the Tidyverse

  6. Web Scraping in R

  7. Introduction to Deep Learning with Keras

TidyCensus Demographic Data Collection:

One of the first steps in our project was to explore the available demographic data in our selected cities and counties. We thought it valuable to understand the demographic data, and we have represented in the plots below:

Creating Test AI Models:

The next step was creating an AI Model. We decided to create an AI Model early in the project before finishing the housing data collection so that we had a better understanding when it came to putting everything together. The AI Model below tests for vegetation in front of houses.

This Week:

In-Person Data Collection

On Tuesday this week, the entire DSPG program went to Slater to practice data collection in person. The housing group took this as an opportunity to collect some housing photos on the ground to use in our AI Model later on.

Google Street View and URLs

We are getting the majority of our photos for the AI to use from Google Street View. Google has an API key that you can use to generate an image for a specific address. We spent the first half of this week pulling addresses from each of our cities and creating URLs to pull the images from Google Street View.

We ran into a couple of problems when doing this, the biggest of which is displayed in the images below. Because we are working with cities in rural areas, there is not Google Street View images available for every street in our cities.

Google Street View information for Grundy Center, Iowa. For reference, population was 2,811 as of 2023.

Google Street View information for Slater, Iowa. For reference, population was 1,639 as of 2023.

Google Street View information for Independence, Iowa. For reference, population was 6,307 as of 2023.

Google Street View information for New Hampton, Iowa. For reference, population was 3,368 as of 2023.

Below is a sample from the tables we created containing the URLs to grab the images from Google Street View.

Web Scraping

Once we were finished collecting addresses and generating URLs, we moved on to scraping the web for more images. We decided to grab images from Zillow, Realtors.com, and the County Assessor pages for our cities. We were able to successfully scrape images from Zillow this week.

When web scraping, we ran into a problem with blurred houses. Upon some research, we found out that some home owners pay Google to have their home blurred for Google Street View.

The next websites we are scraping hold the Iowa Assessors housing data for our four cities. We found a webpage that has links to every counties assessor website. The key at the bottom shows where the data is held. Yellow means the data is held by Vanguard. Blue means the data is online. For most of Iowa’s counties, this means that it is held by Beacon Schneider.

Web Scraping Issues

Independence had issues where the house number was listed as 100/101 and also had 100 1/2. Thankfully the function to grab the Google images ran all the way but said there were 50 errors including both addresses with two house numbers and with half signs. If you remove one of the address numbers or remove the 1/2 (basically removing the / sign) the image url still brings you to a Google image. We could possibly go back through and grab these URL’s and alter them to try and grab these addresses if necessary.

Happies

  • had a great meeting with Erin Olson-Douglas

  • finished collecting and creating URL addresses for Google Street View images

  • Zillow owns Trulia so we don’t have to web scrape both sites :)

  • able to successfully scrape some things from Zillow !

Crappies

  • Web Scraping

  • Beacon and Vanguard have anti-web scraping protections

  • Angelina’s Excel is stupid

Future Plans and Next Steps

Once we are able to scrape enough images off of Zillow, Realtors.com, and the assessor pages, we will be able to move on with creating AI Models. The diagram below outlines how the AI Models will work in the next steps of out project.